Efficient Cepstral Normalization For Robust Speech Recognition
نویسندگان
چکیده
In this paper we describe and compare the performance of a series of cepstrum-based procedures that enable the CMU SPHINX-II speech recognition system to maintain a high level of recognition accuracy over a wide variety of acoustical environments. We describe the MFCDCN algorithm, an environment-independent extension of the efficient SDCN and FCDCN algorithms developed previously. We compare the performance of these algorithms with the very simple RASTA and cepstral mean normalization procedures, describing the performance of these algorithms in the context of the 1992 DARPA CSR evaluation using secondary microphones, and in the DARPA stress-test evaluation. 1. I N T R O D U C T I O N The need for speech recognition systems and spoken language systems to be robust with respect to their acoustical environment has become more widely appreciated in recent years (e.g. [1]). Results of many studies have demonstrated that even automatic speech recognition systems that are designed to be speaker independent can perform very poorly when they are tested using a different type of microphone or acoustical environment from the one with which they were trained (e.g. [2,3]), even in a relatively quiet office environment. Applications such as speech recognition over telephones, in automobiles, on a factory floor, or outdoors demand an even greater degree of environmental robusmess. Many approaches have been considered in the development of robust speech recognition systems including techniques based on autoregressive analysis, the use of special distortion measures, the use of auditory models, and the use of microphone arrays, among many other approaches (as reviewed in [1,4]). In this paper we describe and compare the performance of a series of cepstrum-based procedures that enable the CMU SPHINX-II speech recognition system to maintain a high level of recognition accuracy over a wide variety of acoustical environments. The most recently-developed algorithm is multiple fixed codeword-dependent cepstral normalization (MFCDCN). MFCDCN is an extension of a similar algorithm, FCDCN, which provides an additive environmental compensation to cepstral vectors, but in an environmen t spec i f i c f a sh ion [5]. M F C D C N is less computationally complex than the earlier CDCN algorithm, and more accurate than the related SDCN and BSDCN algorithms [6], and it does not require domain-specific paining to new acoustical environments. In this paper we describe the performance of MFCDCN and related algorithms, and we compare it to the popular RASTA approach to robustness. 2. E F F I C I E N T C E P S T R U M B A S E D C O M P E N S A T I O N T E C H N I Q U E S In this section we describe several of the cepstral normalization techniques we have developed to compensate simultaneously for additive noise and linear filtering. Most of these algorithms are completely data-driven, as the compensation parameters are determined by comparisons between the testing environment and simultaneouslyrecorded speech samples using the DARPA standard closetalking Sennheiser HMD-414 microphone (referred to as the CLSTLK microphone in this paper). The remaining algorithm, codeword-dependent cepstral normalization (CDCN), is model-based, as the speech that is input to the recognition system is characterized as speech from the CLSTLK microphone that undergoes unknown linear filtering and corruption by unknown additive noise. In addition, we discuss two other procedures, the RASTA method, and cepstral mean normalization, that may be referred to as cepstral-filtedng techniques. These procedures do not provide as much improvement as CDCN, MFCDCN and related algorithms, but they can be implemented with virtually no computational cost. 2.1. Cepstral Normalization Techniques SDCN. The simplest compensation algorithm, SNRDependent Cepstral Normalization (SDCN) [2,4], applies an additive corr~tion in the cepstral domain that depends exclusively on the instantaneous SNR of the signal. This correction vector equals the average difference in cepstra
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملPowered cepstral normalization (p-CN) for robust features in speech recognition
Cepstral normalization has been popularly used as a powerful approach to produce robust features for speech recognition. Good examples of approaches in this family include the well known Cepstral Mean Subtraction (CMS) and Cepstral Mean and Variance Normalization (CMVN), in which either the first or both the first and the second moments of the Mel-frequency Cepstral Coefficients (MFCCs) are nor...
متن کاملA New Data Driven Method for Robust Speech Recognition
The conventional view on the problem of robustness in speech recognition is that performance degradation in ASR systems is due to mismatch between training and test conditions. If problem of robustness in ASR systems were considered as a mismatch between the training and testing conditions the solution would be to find a way to reduce it. Common approaches are: Data-Driven methods such as speec...
متن کاملExtended powered cepstral normalization (p-CN) with range equalization for robust features in speech recognition
Cepstral normalization has been popularly used as a powerful approach to produce robust features for speech recognition. A new approach of Powered Cepstral Normalization (P-CN) was recently proposed to normalize the MFCC parameters in the r1-th order powered domain, where r1 > 1.0, and then transform the features back by an 1/r2 power order to a better recognition domain, and it was shown to pr...
متن کاملAugmented Cepstral Normalization for Robust Speech Recognition
We proposed an augmented cepstral mean normalization algorithm that differentiates noise and speech during normalization, and computes a different mean for each. The new procedure reduced the error rate slightly for the case of sameenvironment testing, and significantly reduced the error rate by 25% when an environmental mismatch exists over the case of standard cepstral mean normalization.
متن کاملFeature and distribution normalization schemes for statistical mismatch reduction in reverberant speech recognition
Reverberant noise has been a major concern in speech recognition systems. Many speech recognition systems, even with state-of-art features, fail to respond to reverberant effects and the recognition rate deteriorates. This paper explores the significance of normalization strategies in reducing statistical mismatches for robust speech recognition in reverberant environment. Most normalization wo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993